Week 7 – Exploring Sampling
This week, we begin our adventure into statistical inference by learning about sampling. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in the second portion of this week’s reading and next week. We will see that the skills you learned during the first three weeks of class, in particular data visualization and data wrangling, will also play an important role in the development of your understanding of sampling.
This week’s reading is a compilation of Chater 7 and Chapter 8 from ModernDive (Kim et al. 2020), with a smattering of my own ideas.
0.1 Reading Guide
Download the reading guide as a Word Document here Download the reading guide as an HTML file here
1 Sampling bowl activity
Let’s start with a hands-on activity.
1.1 What proportion of this bowl’s balls are red?
Take a look at the bowl in Figure 1. It has a certain number of red and a certain number of white balls all of equal size. Furthermore, it appears the bowl has been mixed beforehand, as there does not seem to be any coherent pattern to the spatial distribution of the red and white balls.
Let’s now ask ourselves, what proportion of this bowl’s balls are red?
One way to answer this question would be to perform an exhaustive count: remove each ball individually, count the number of red balls and the number of white balls, and divide the number of red balls by the total number of balls. However, this would be a long and tedious process.
1.2 Using the shovel once
Instead of performing an exhaustive count, let’s insert a shovel into the bowl as seen in Figure 2. Using the shovel, let’s remove \(5 \times 10 = 50\) balls, as seen in Figure 3.
Observe that 17 of the balls are red and thus 0.34 = 34% of the shovel’s balls are red. We can view the proportion of balls that are red in this shovel as a guess of the proportion of balls that are red in the entire bowl. While not as exact as doing an exhaustive count of all the balls in the bowl, our guess of 34% took much less time and energy to make.
However, say, we started this activity over from the beginning. In other words, we replace the 50 balls back into the bowl and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl’s balls that are red be exactly 34% again? Maybe?